-
Couldn't load subscription status.
- Fork 576
feat(tdigest): add the support of TDIGEST.REVRANK command #3130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution. Could you add some golang test cases for it? Refer to https://github.com/apache/kvrocks/blob/unstable/tests/gocase/unit/type/tdigest/tdigest_test.go. |
|
@PragmaTwice ok,I will add some golang test |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| { | ||
| LockGuard guard(storage_->GetLockManager(), ns_key); | ||
|
|
||
| if (auto status = getMetaDataByNsKey(ctx, ns_key, &metadata); !status.ok()) { | ||
| return status; | ||
| } | ||
|
|
||
| if (metadata.total_observations == 0) { | ||
| result->resize(inputs.size(), -2); | ||
| return rocksdb::Status::OK(); | ||
| } | ||
|
|
||
| if (metadata.unmerged_nodes > 0) { | ||
| auto batch = storage_->GetWriteBatchBase(); | ||
| WriteBatchLogData log_data(kRedisTDigest); | ||
| if (auto status = batch->PutLogData(log_data.Encode()); !status.ok()) { | ||
| return status; | ||
| } | ||
|
|
||
| if (auto status = mergeCurrentBuffer(ctx, ns_key, batch, &metadata); !status.ok()) { | ||
| return status; | ||
| } | ||
|
|
||
| std::string metadata_bytes; | ||
| metadata.Encode(&metadata_bytes); | ||
| if (auto status = batch->Put(metadata_cf_handle_, ns_key, metadata_bytes); !status.ok()) { | ||
| return status; | ||
| } | ||
|
|
||
| if (auto status = storage_->Write(ctx, storage_->DefaultWriteOptions(), batch->GetWriteBatch()); !status.ok()) { | ||
| return status; | ||
| } | ||
|
|
||
| ctx.RefreshLatestSnapshot(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @donghao526 ,
The merge action could be refactored into a function to reduce duplication of the same logic.
Best Regards,
Edward
src/types/redis_tdigest.cc
Outdated
| for (auto value : inputs) { | ||
| auto status_or_rank = TDigestRevRank(dump_centroids, value); | ||
| if (!status_or_rank) { | ||
| return rocksdb::Status::InvalidArgument(status_or_rank.Msg()); | ||
| } | ||
| result->push_back(*status_or_rank); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @donghao526 ,
We could sort the inputs and get the ranks with just one scan of the centroids since it's sorted.
Best Regards,
Edward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @LindaSummer
I encountered a problem when I was testing. After the nodes merged, are there two adjacent centroids can be with the same mean?
I Test with
TDIGEST.CREATE s COMPRESSION 1000
TDIGEST.ADD s 10 10 10 10 20 20
I found the centroids after merged are:
(1) mean: 10 weight: 1
(2) mean: 10 weight: 1
(3) mean: 10 weight: 1
(4) mean: 10 weight: 1
(5) mean: 20 weight: 1
(6) mean: 20 weight: 1
Is this as expected or a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @donghao526 ,
It is expected, and you could refer to #2878 for more details.
So we need a stable way for both serialization and deserialization.
The trigger for the merge is the weight, not the mean. So we could treat the mean only as a label of one centroid. The whole logic is driven by weight.
Best Regards,
Edward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
src/types/tdigest.h
Outdated
| } | ||
|
|
||
| template <typename TD> | ||
| inline StatusOr<int> TDigestRevRank(TD&& td, double value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @donghao526 ,
We need to use a stable way to compare between doubles.
It will be tough to assume that the two double numbers are equal to or greater than.
After solving this, we should add some test cases for this corner case.
Best Regards,
Edward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @donghao526 ,
Since the other code snippets use this way now. You could leave it with the current logic.
I will try to create a new PR to solve the unstable comparison problem in this file.
Best Regards,
Edward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, after your new PR, I can help to fix here.
…ks into feature/tdigest-revrank
|
@LindaSummer I have modified the code according to your suggestion, please review.
Best Regards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/types/tdigest.h
Outdated
| } | ||
|
|
||
| template <typename TD> | ||
| inline Status TDigestRank(TD&& td, const std::vector<double>& inputs, std::vector<int>& result) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems specialized for RevRank rather than Rank.
Could we implement the Rank, then wrap the Iterator to a reverse version with the same logic to construct RevRank?
|
Hi @donghao526 , Sorry to interrrupt you. Best Regards, |
Hi @LindaSummer , Sorry, I was very busy in the past two months. I'm still work on this, and I will submit new patch within the a week. |
Hi @donghao526 , Thanks very much for your effort and contribution! ❤ You could do this anytime when you have time. Your personal affairs are always in first priority and we don't need to do it in a rush. 😊 Best Regards, |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the TDIGEST.REVRANK command, which returns the reverse rank (number of elements greater than the given value) for one or more values in a t-digest data structure. The implementation includes support for handling edge cases like empty digests, values outside the range, and duplicate values.
Key changes:
- Adds the
TDigestRevRankalgorithm to compute reverse ranks for input values - Refactors merge logic by extracting a reusable
mergeNodesmethod - Adds comprehensive test coverage in both Go integration tests and C++ unit tests
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/types/tdigest.h | Defines the TDigestRevRank template function and helper for rank assignment |
| src/types/redis_tdigest.h | Declares the RevRank and mergeNodes methods in the TDigest class |
| src/types/redis_tdigest.cc | Implements RevRank, refactors mergeNodes, and fixes iterator validation logic |
| src/commands/cmd_tdigest.cc | Adds the CommandTDigestRevRank command handler and registration |
| tests/cppunit/types/tdigest_test.cc | Adds C++ unit tests for various RevRank scenarios |
| tests/gocase/unit/type/tdigest/tdigest_test.go | Adds Go integration tests for TDIGEST.REVRANK command |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/types/redis_tdigest.cc
Outdated
| // because we must guarantee the iterator to be inside the valid range before iteration. | ||
| bool Prev() { | ||
| if (Valid() && iter_ != centroids_.cbegin()) { | ||
| if (Valid()) { |
Copilot
AI
Oct 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After calling std::advance(iter_, -1), the iterator could become invalid (move before cbegin()). The validity check on line 79 will return true even when iter_ is before cbegin(), allowing subsequent operations on an out-of-bounds iterator. The validity check should occur before the advance operation, and the advance should only happen if iter_ != centroids_.cbegin().
| if (Valid()) { | |
| if (Valid() && iter_ != centroids_.cbegin()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @donghao526 ,
Thanks very much for your effort! ❤
Left with some comments.
Please also go thorugh the comments left by github-copilot.
Best Regards,
Edward
src/types/redis_tdigest.cc
Outdated
| return Valid(); | ||
| } | ||
| bool Valid() const { return iter_ != centroids_.cend(); } | ||
| bool Valid() const { return iter_ < centroids_.cend() && iter_ >= centroids_.cbegin(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that iter_ should always greater or equal to the cbegin().
| } | ||
| *output = redis::Array(rev_ranks); | ||
| } else { | ||
| *output = redis::BulkString("nan"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer to tdigest.revrank.
All values are -2 if the sketch is empty.
| inline Status TDigestRevRank(TD&& td, const std::vector<double>& inputs, std::vector<int>& result) { | ||
| std::map<double, std::vector<size_t>> value_to_indices; | ||
| for (size_t i = 0; i < inputs.size(); ++i) { | ||
| value_to_indices[inputs[i]].push_back(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is hard to compare two double number for a map. we need a stable way for the compare operator.





ISSUE
It closes #3063.
Proposed Changes
Add TDIGEST.REVRANK command implementation
Add cpp unit tests